-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows support #559
Windows support #559
Conversation
The main.js file from conda-store-ui contains an emoji character that cannot be encoded correctly unless the encoding is set to utf-8, which is not always the default on Windows.
✅ Deploy Preview for kaleidoscopic-dango-0cf31d canceled.
|
Windows uses a different type of permissions system than Unix. For now, using None will cause action_set_conda_prefix_permissions to do nothing.
As far as I can tell, with the latest changes, the server process runs just fine on Windows. However, the worker process does not (#507 (comment)), so nothing substantial actually works. However, I am able to login and to click on the various UI elements, and everything that doesn't require a worker seems to function (there's no actual indication that there are no workers, but this is a separate issue that has been mentioned elsewhere). |
1000 hits the sqlite3 max expression depth and results in an error.
One issue I am running into is that installs are failing with errors like
This is a known issue in conda. See conda/conda#7203. Windows, by default, only allows paths up to 260 characters. See https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation It is possible to remove this limitation in Windows 10 and later if you set a registry value https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry#enable-long-paths-in-windows-10-version-1607-and-later. However, this (presumably) requires administrator privileges so I don't know if it's a viable thing to require. The exact length of the path depends on the specific conda package. However, it's pretty obvious that the issue is the long UUID generated by conda-store: |
At any rate, I think I am going to work on finishing up #554 as that should at least allow me to install some test packages so I can work on the remaining Windows issues (I guess I could also just set that registry value on my system, but that PR needs to be finished anyway). |
Based on this comment, conda/conda#7203 (comment), if we want to try to make the path no longer than a typical conda install default, the part of the path between Of course, most packages don't have really long paths, but some do. |
This disables --beat on Windows and sets FORKED_BY_MULTIPROCESSING=1, which is required to make the default celery pool work on Windows.
os.path.join is incorrect for constructing a URL on Windows because URLs should always use forward slashes regardless of platform.
@nkaretnikov to test |
|
||
# Round up | ||
n_blocks = (apparent_total_bytes + 511) // 512 | ||
return n_blocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do more tests tomorrow. So far: tested on Linux against du -sb
, returns wrong results for /bin
. For /bin/ls
it seems to work. I'll write a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is a bit tricky.
First, we incorrectly calculate file size on macOS with du
. There's no built-in set of options that can give us the equivalent of -sb
AFAICT. We could install du
from GNU coreutils, which is what is used on Linux, then we'd get the same size as our script. But I'd suggest we just use our Python code to calculate the size on all systems. However, it would still not match what Finder is showing, but I think it's close enough to simply ignore. Different tools calculate this differently and it's not hugely important to users unless we're in the same ballpark.
I also had to remove the round up calculation from the original script. Also, simplified it.
I also haven't tested this script with symlink/filesystem loops. Is this a safe assumption to make that we won't run into this where this is supposed to be used?
My updated code + some tests: https://gist.github.com/nkaretnikov/1a66b90a74fa805f1022e90252e54c87
Note: uncomment the TemporaryDirectory line again and remove two lines after it, which were added for testing on Windows:
# with tempfile.TemporaryDirectory() as dir:
dir = "c:\\tmpdir"
os.makedirs(dir)
On Windows, I've also attempted to test against du from SysInternals, but it also shows different info. It hasn't been updated in a while, so I just ignored that.
In general, I suggest we rely on what OS built-in "File info" tools return to debug this.
On Linux, the updated script matches the du command we're using.
Windows (updated Python du matches native File info size, Sysinternals du shows different info):
macOS (updated Python du roughly matches native File info, built-in du returns different size):
macOS (updated Python du matches du from GNU coreutils, installed via brew):
Note: creating symlinks on Windows requires having Developer Mode on:
Settings > Privacy & Security > For Developers and turn Developer Mode to on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I tested it on my Mac, du and this function gave the same numbers. I'm not sure if the subtle differences of blocks matter much (it matters if there are sparse or compressed files, but I doubt those would show up). The main thing is that we treat hard links correctly.
Note that in my tests, this function is significantly slower than du
, so we should consider whether it's worth using it over du.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that in my tests, this function is significantly slower than du, so we should consider whether it's worth using it over du.
Good point. On which dir did you test, can you post the numbers?
When I tested it on my Mac, du and this function gave the same numbers.
How did you test it? Did you test against the built-in du? What filesystem did you test on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested it on my conda-store-state with a couple of environments:
$ du -sAB1 conda-store-state/
1111657 conda-store-state/
>>> from conda_store_server.utils import du
>>> du('conda-store-state')
1111657
That's on my Mac. I also can confirm that some of the environments do share files via hard-links:
$ ls -i conda-store-state/admin/b82cde5b5489ceffd8a8589ebd73f20a9f4836260b18295f49e057c441b235dc-20231005-213031-558981-2-test2/lib/libsqlite3.0.dylib
256048028 conda-store-state/admin/b82cde5b5489ceffd8a8589ebd73f20a9f4836260b18295f49e057c441b235dc-20231005-213031-558981-2-test2/lib/libsqlite3.0.dylib
$ ls -i conda-store-state/admin/99108419ad0fd922fdeff9bbc434b58d41f68e3f923a83f6a7ab19568463bc82-20231005-211948-613237-1-test/lib/libsqlite3.0.dylib
256048028 conda-store-state/admin/99108419ad0fd922fdeff9bbc434b58d41f68e3f923a83f6a7ab19568463bc82-20231005-211948-613237-1-test/lib/libsqlite3.0.dylib
I don't remember if I tested it on Linux, so it's possible there's a discrepancy there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a fix for this. Note that the numbers will be off a little bit (by less than 512) on Mac because du
rounds up to the nearest multiple of 512 when converting bytes to blocks.
However, we still need to figure out something for this du()
function. I just tested it against my main ~/anaconda
directory and I had to exit out of it after several minutes. I think it might be accidentally quadratic.
There is a Windows du
command someone at https://learn.microsoft.com/en-us/sysinternals/downloads/du. Maybe we should just package that as a conda package so we can just use it directly (or maybe it already is packaged?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually even regular du -sAB1 ~/anaconda
is slow on my machine. Maybe it's just too hard to deal with that many hard links.
But I just noticed that the way conda-store is using this, it gathers the stats for each prefix as it is created. I need to double check it, but I think it actually isn't accounting for hard links across environments correctly anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I'd appreciate some guidance from @costrouc or someone else on how disk_usage is actually used in conda-store before I feel comfortable with the du stuff here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a Windows du command someone at https://learn.microsoft.com/en-us/sysinternals/downloads/du. Maybe we should just package that as a conda package so we can just use it directly (or maybe it already is packaged?).
I've already tried it. See above where I post my du test results. Sysinternals du printed wrong results for me, compared to Windows file explorer. The output also differs from Linux/macOS du visually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn't notice you tested it already.
Regarding what the Finder returns, I would be curious to know exactly why the discrepancy exists. You can access it programmatically with AppleScript
osascript -e 'tell application "Finder" to get physical size of folder "Macintosh HD:Users:aaronmeurer:Documents:conda-store:conda-store-server:conda-store-state"'
6.6859008E+8
(I highly recommend using an LLM to help you write AppleScript)
@trallard How urgent is this needed? Personally, I'd rather get Windows working on CI first, see: #603 @asmeurer Do conda-store-server tests pass for you with @asmeurer What's known to work? What doesn't work? I'll take a closer look tomorrow. For now, I can say that running the main UI at |
Quite a few tests fail for me too. I forgot to mention that 0.0.0.0 doesn't work, but 127.0.0.1 or localhost does. I think it has something to do with the default way IPs are configured on Windows. |
# The default Celery pool requires this on Windows. See | ||
# https://stackoverflow.com/questions/37255548/how-to-run-celery-on-windows | ||
if sys.platform == "win32": | ||
os.environ.setdefault('FORKED_BY_MULTIPROCESSING', '1') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes me worried that we rely on a library that has no official Windows support.
It's worth discussing the level of Windows support we expect to provide ourselves and possibly consider alternatives to celery.
I did attempt to test this by running my concurrency test from 3fc0e14. I also parameterized it by all pools from get_available_pool_names
. All of them failed. This could be for a number of reasons, but it's not worth investigating at the moment due to 50% of the testsuite failing for me. Also: pytest celery fixture might require additional tweaks to get working.
Before we start talking about celery or concurrency, I suggest we get the rest of the testsuite working on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the other pools fail because they aren't actually concurrent on Windows, and one of the tasks blocks all the others (I think the watch task, but I'm not completely sure). But even if that weren't the case, conda-store would be very slow without concurrent tasks due to some slow tasks like updating channels.
This comment was marked as outdated.
This comment was marked as outdated.
@asmeurer @trallard This concludes my review for now, here's the summary (you can skip the rest of the comments if you just want a high-level overview):
|
Realized I was running the server incorrectly to test env building, so a quick update. Building a simple env failed again, but for a different reason:
Env:
Backtrace in Build log (via web UI):
Command to start the server: |
The "filename or extension is too long" error is described here #559 (comment). For now, you'll have to set the registry key to enable long file paths to test this on Windows. This is tracked by #588 |
The cookie length was incorrectly using .seconds instead of .total_seconds(). This was only a problem on Windows because Windows has a ~16 ms precision for timedeltas, meaning the .seconds attribute for the difference was sometimes being set to 0.
with: | ||
activate-environment: conda-store-server-dev | ||
environment-file: conda-store-server/environment-windows-dev.yaml | ||
auto-activate-base: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heads up: #634 switches to miniforge. This will need to be updated to match the rest of the jobs once that PR lands.
@@ -18,7 +18,7 @@ def wrapper(*args, **kwargs): | |||
# redirect stdout -> action_context.stdout | |||
stack.enter_context(contextlib.redirect_stdout(action_context.stdout)) | |||
|
|||
# redirect stderr -> action_context.stdout | |||
# redirect stderr -> action_context.stderr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line below uses action_context.stdout
, so this change is wrong. Or the line below needs to be updated.
@@ -9,6 +9,8 @@ | |||
from conda_store_server import conda_utils, utils | |||
from pydantic import BaseModel, Field, ValidationError, constr, validator | |||
|
|||
ON_WIN = sys.platform.startswith("win") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already declared in another file. Can we move that definition somewhere where it makes the most sense and import everywhere? Ideally, this should be done in a way that minimizes potential import cycles.
@@ -194,20 +196,20 @@ class Settings(BaseModel): | |||
metadata={"global": True}, | |||
) | |||
|
|||
default_uid: int = Field( | |||
os.getuid(), | |||
default_uid: int | None = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This syntax is Python 3.10+ IIUC. Let's use Optional[int]
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this should be fine, but we need to add from __future__ import annotations
. I guess we aren't testing old Python versions anywhere. What's the oldest version conda-store should support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what the minimal version is. The point I was trying to make: we don't use it anywhere else. I don't see why we should introduce this and immediately bump our lowest supported version to 3.10+ or require an extra import. Let's just use Optional, like everywhere else.
address = "localhost" if self.address == "0.0.0.0" else self.address | ||
print( | ||
f"Starting standalone conda-store server at http://{address}:{self.port}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this. If we're using self.address
, we should print self.address
. I understand this is trying to be helpful, but it'll only make issues harder to debug later. I'd suggest to remove this entirely.
name: conda-store-server-dev | ||
channels: | ||
- conda-forge | ||
- Microsoft |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR #634 made changes to yaml files, please update the Windows one to match those changes once 634 lands.
@@ -223,7 +224,7 @@ def get(self, key): | |||
return f.read() | |||
|
|||
def get_url(self, key): | |||
return os.path.join(self.storage_url, key) | |||
return posixpath.join(self.storage_url, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you remember what failed without this? The effect of this change will be converting Windows paths to posixpaths:
>>> import posixpath
>>>
>>> posixpath.join("foo", "bar")
'foo/bar'
>>>
>>> import os
>>> os.path.join("foo","bar")
'foo\\bar'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows backslash paths are not correct for file URLs. Without this change, some of the buttons in the UI don't work because they use %5C
(URL encoded \
) instead of /
.
if os.path.islink(dp): | ||
apparent_total_bytes += os.lstat(dp).st_size | ||
|
||
return apparent_total_bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you seen an implementation I shared here? #559 (comment)
It fixes some style issues and the code is shorter. I also find it more readable because variables are shorter. The first two ifs can be written as one.
Direct link to gist: https://gist.github.com/nkaretnikov/1a66b90a74fa805f1022e90252e54c87
There are also tests there that you could re-use, like the file/directory creation logic. Ignore the part where I check against tools, it should just check against hardcoded numbers. But it's not a blocker anyway.
val = disk_usage(test_file) | ||
assert isinstance(val, str) | ||
assert 1000 <= int(val) <= 1500 | ||
assert 1000 <= du(test_file) <= 1500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our test file is 1000 bytes. A bit concerning that our margin is 50%. On multiple files, it'll add up quickly. Perhaps having a tighter range would be possible if we had an if-then-else for each platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The margin would be less if the files were bigger. It has to do with block rounding and the size of a directory. The exact value depends on the platform and filesystem.
@asmeurer I've run this on Windows on 2216f47. It built a simple env in standalone mode. I had to set 1 in the registry to allow for longer paths, but that's expected. I've also seen this error, not sure whether it's something we should worry about for now:
I'm waiting for you to rebase and address some of my comments. But overall this looks good, this is initial Windows support. We can fix other things in follow-ups. Thanks! |
Regarding the CI, hopefully the other changes don't break things here. I had to do quite a few workarounds to get stuff working on CI, due to issues with defaults/conda-forge and also some issues with mamba. On the other hand, it's also possible the new miniforge stuff will make some of my workaround here no longer necessary, so we should definitely try removing them. |
Taking over. See #640. |
You'd need to debug to see the real source of the error. |
See #507
Nothing is working yet, but I plan to push changes up here as I make them.
Description
This pull request:
Pull request checklist
Additional information